
********************************************************************************
* Create a dataset with basic individual characteristics for the full Swedish 
* population 2010 (used to calculate averages among families):
********************************************************************************

* Retrieve table "individ_2010" from SQL database
odbc load, exec("select * from individ_2010")dsn("P0846") clear

* Save raw data so I do not need to use odbc each time (which is slow)
save Data/ODBC_individ_2010, replace
use Data/ODBC_individ_2010, clear
 
keep P0846_LopNr_personnr P0846_lopnr_PeOrgNr P0846_lopnr_CfarNr Civil FodelseAr Kon LoneInk Sun*niva AstKommun Forsamling Barn*

* There are a few duplicates with equal values of all variables.
* Keep the first occurrence in these cases
duplicates drop

* Use better variable names
rename P0846_LopNr_personnr PersonId
rename P0846_lopnr_CfarNr WorkplaceId

* Not a valid WorkplaceId 
replace WorkplaceId = . if WorkplaceId ==  80491
egen num_of_workers = count(PersonId) if WorkplaceId != ., by(WorkplaceId)

* A measure of being employed based on income (income is measured in 100s of SEK)
rename LoneInk Income
destring Income, replace
hist Income if Income > 1000 & Income < 10000
gen Employed = Income > 1000 & Income != .
tab Employed, miss

* Gender
gen Female = Kon == "2"
tab Female, miss
drop Kon

* Age
destring FodelseAr, replace
gen Age2010 = 2010 - FodelseAr
drop FodelseAr

* Education based on Sun2000niva (see LISA documentation)
tab Sun*niva, miss
destring Sun*niva, replace
rename Sun*niva sunnivå
gen edu = trunc(sunnivå/100)
tab edu, miss
gen EduLessHighSchool = edu <= 2 
gen EduHighSchool = edu >= 3 & edu != 9
gen EduCollege = edu >= 4 & edu != 9
gen EduMiss = edu == 9
drop edu
drop sunnivå

* Married 
tab Civil, miss 
replace Civil = trim(Civil)
gen Married = Civil == "G" | Civil == "RP"
tab Married, miss
drop Civil

* Number of children
egen NumberOfChildren = rowtotal(Barn0_3 Barn11_15 Barn16_17 Barn18_19 Barn20plus Barn4_6 Barn7_10)
tab NumberOfChildren
drop Barn*

* Munic
destring AstKommun, replace

* Parish
destring Forsamling, replace

compress
save Data/data_ind_X_2010, replace



